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Abstract 

We obtain the Maximum Entropy distribution for an asset from call and digital option prices. A 
rigorous mathematical proof of its existence and exponential form is given, which can also be applied 
to legitimize a formal derivation by Buchen and Kelly [4] . We give a simple and robust algorithm for 
our method and compare our results to theirs. Finally, we present numerical results which show that 
our approach implies very realistic volatility surfaces even when calibrating only to at-the-money 
options. 
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1 Introduction 

The recent market turbulence caused by the credit crunch has exposed in a drastic way the consequences 
of overconfidence in financial modelling assumptions. Typically, a financial model, such as the famous 
Black-Scholes model, will assume that the price of an asset follows a given stochastic process whose 
parameters need to be calibrated to market prices. If a model becomes an accepted standard and most 
market participants adopt it, problems can occur when assumptions that hold under normal market 
conditions are also expected to hold under abnormal ones. An example is the stock market crash of 
1987, where the volatilities used for pricing at-the-money options were also used for pricing far out-of- 
the-money put options. As the market headed downwards, it turned out that the true hedging cost for 
somebody who had sold such puts was far greater than the received premium. Another good example is 
described in the recent paper [5^, where the authors demonstrate for CDOs and CDO^s what can happen 
to asset prices when model parameters that are hard to observe or estimate with sufficient accuracy are 
put to a true stress test. However, they write "The good news is that this mistake can be fixed. For 
example, a Bayesian approach that explicitly acknowledges that parameters are uncertain would go a 
long way towards solving this problem." [5] 

*Wc would like to thank David Chevance, Peter Jackel and Wolfgang Scherer for helpful comments and suggestions. 
We would also like to thank the organizers of the WBS 5th Fixed Income Conference in Budapest, where we had the 
opportunity to present some of our results. 
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Another well-established way to obtain estimates for such parameters from observable data, which 
we will follow here, is via Maximum Entropy (ME) methods. Such an estimate "is the least biased 
estimate possible on the given information, i.e., it is maximally noncommittal with regard to missing 
information." [TT] 

The concept of Entropy, which has its origin in thermodynamics, nowadays has a broad range of ap- 
plications in physics, biology and, more recently, in finance and sociology. Boltzmann gave its Statistical 
Mechanics interpretation and Shannon defined it in terms of Information Theory. 

The fundamental postulate of Statistical Mechanics, roughly speaking, says that all possible micro- 
scopic states of a system have equal probabilities. The macroscopic equilibrium state is the one that 
allows for the highest number of possible microscopic states. The Entropy of a macroscopic state is a 
way to measure the number of microscopic states which correspond to this state. The equilibrium state 
is the one which maximizes Entropy. 

In Information Theory, when one has little information on the probability distribution of a random 
variable, the Entropy should be maximized to find the most unbiased distribution which agrees with that 
information. For example, if we have no information on a die being biased, maximizing Entropy under 
no constraints leads to a probability of 1/6 for each face. If, however, we know that the expected value 
for the outcome of a roll is, say, 2, then the uniform distribution does not agree with that information. 
What probability distribution should we consider in this case? From the Information Theory viewpoint 
the Entropy Maximizer, subject to the constraint of having expectation 2, is the best choice. 

Let us finish this motivation by mentioning that the probability distribution over the interval [0, 1] 
which maximizes Entropy is the uniform distribution. There is no Entropy Maximizer for distributions 
over M. However, when the mean and variance are specified, the Gauss distribution with these parameters 
maximizes Entropy. 

We concentrate on the distribution of an asset price at a given time in the future, for which there are 
some option data. We develop a highly robust technique to find a Maximum Entropy density (MED) for 
the asset in case we have call and digital option prices. The density is obtained by partitioning the range 
of possible stock prices into buckets, i.e. the interval between adjacent strikes given by the option data, 
but, in contrast to the Black-Scholes model, making no a priori assumption about the asset's distribution. 
Instead, we maximize the Boltzmann-Shannon Entropy to obtain a distribution that only respects the 
given option prices and is otherwise unbiased. The density can in turn be used to interpolate implied 
volatilities and, repeating this for a range of maturities, to obtain a volatility surface. The results agree 
surprisingly well with observed volatility surfaces from the markets. 

Some authors have proposed similar Entropy Maximization methods to infer the probability distribu- 
tion for an asset from option prices ([I], [2] and j4j). This maximization problem corresponds to finding 
a set of Lagrange multipliers. They present numerical methods to find such multipliers by solving an iV- 
dimensional problem (where is the number of constraints given by option prices) . In the present work 
this task is simplified by localizing the problem: the ME model finds, n = N/2 times, the root of a strictly 
monotonic function in one variable whose derivative is known analytically, so that the Newton-method 
can be used for excellent speed of convergence and stability. This means that in practical applications 
the ME model can be calibrated to market data very quickly, which is crucial when using the model for 
many assets at many maturities or in a Monte Carlo simulation. 

In 2 the authors write "There is a problem with this type of calculation" , meaning that the formal 
Lagrange Multipliers approach is not mathematically rigorous. However, using a result of Csiszar we 
show that the result can be indeed proven rigorously. 

The density in our case differs slightly from the one mentioned above. We therefore investigate the 
differences between the two approaches. In both cases one can also use information from a so-called prior 
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density, if available, leading to the concept of Relative Entropy, and we compare the densities obtained 
by this method. 



2 The Maximum Entropy Distribution Using Calls and Digitals 

2.1 Maximum Entropy Distribution 

We are given a fixed maturity T, strictly increasing strikes Kq — 0,Ki, ...,_ftr„, ifn+i ~ oo, and undis- 
counted call and digital prices 

a ■.= CiK,,T)/DF{0,T), A := D{K,,T)/DFiO,T) 

at these strikes, where DF{0, T) denotes the discount factor. Throughout we make the convention 

Cn+l = Kn+lDn+1 = 0. 

Assuming risk neutral pricing, we will determine a density g for the underlying asset price S(T) which 
maximizes Entropy 

/■oc 

Eig) ■■= g{x)lng{x)dx 



oo 



^0 

under the constraints 

[{S{T) - K,)+] = a, i.e. I (x - K,)g{x)dx = (1) 

and 



Ki 



POO 

[l{s(T)>Kj] = A, i.e. / g{x)dx^bi 

JK, 



(2) 



for all i = 0, n. In particular, these two constraints for i = mean that g is a density, since 
/q g(x)dx = Dq =^ 1, and that the martingale condition 

/>OC 

Ef [S{T)] = / xg{x)dx = MS) 
Jo 

is satisfied, since Cq = /t(5'), the forward price of S for time T. 
From the second constraint it immediately follows that 

g{x)dx = Di- Di+i yi = 0,...,n. (3) 

Ki 

Looking at a call spread with strikes Ki,Ki-^-l raised to level Ki, i.e. a derivative that pays S{T) if 
Ki < S{T) < Ki^i and zero otherwise, we obtain the condition 

xg{x)dx = (C, + K^Di) - {C^+l + A',+i A+i) Vi = 0, n. (4) 

Ki 

We now calculate the density g under the constraints given above. The purpose of Theorem 1 2. 2 1 is to 
show that the local constraints Q and (jlj are equivalent to the global constraints ([T]) and Moreover, 



j g{x)\ng{x)dx ^^i- J g{x)\n g{x)dx j , 
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and, thus, wc only need to maximize — /^'^^ g{x) hig{x)dx subject to Q and (jl]) over each bucket. 
Let be the set of positive Borel-measurable functions defined on [0,00 [. Define 









poo 

/ g{x)dx = Di, 


poo 
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— Ki)g{x)dx - 












J Ki 






= 0, 


■ ■ ■ J 


n, 






























/ gix)dx 


= A- 


A+i, 








geM+ 
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/ xg{x)dx 


= (a 4 


- if, A) - (^4 


-1 + /f,+iA+i) 








J Ki 









Proposition 2.1 A' = nr=o'^*- 

Proof We show X C n"=o '^i- Let g e A". Then, for any i, 

rKi^i POO POO 

g{x)dx = / g{x)dx — / g{x)dx ^ Di — Di^i. 

Ki JKi -JKi^i 

Moreover, 



(x - Ki^i)g{x)dx 



Ci ~ C.,+1 = I {x~ Ki)g{x)dx 

Ki JKi^ 

xg{x)dx - KiDi + i^i+iA+i- 
Therefore, 5 e A^. 

Conversely, we show X D nr=o Suppose that g G for alH = 0, n. Then 



POO r-t^j + i 

/ = V / 5(a::)dx 

n 

= ^(A- A+i) = A- Ai+i = A 



=0 



Additionally, 



POO " fKj + i 

/ (a; — Ki)g{x)dx = >, / xg[x)dx — Ki j g{x)dx 



Ki 



n 

E [(^^- + ^^^i) - (^^+1 + ^.+1 A+i) 



= (a + K,Di) - (C„+i + if„+i A.+i) - A - c, 

=0 =0 



It follows that g € X. 



□ 



For i = 0, n, we define 

Et{g) ■■= - g{x)lng{x)dx \/g € Xi. 

JKi 



Theorem 2.2 If g is a maximizer of E on X , then g is a maximizer of Ei on Xi. Conversely, if g is a 
maximizer of Ei on Xi for all i = 0, n, then g is a maximizer of E on X . 

Proof Let g be a maximizer of E on X, and let h ^ Xi. Define 

Since g — h on [Ki,Ki^i[^ we liave g g Xi. Moreover, for j ^ i, we fiave g ~ g on [Kj,Kj^i[, 
and thus g G Xj. It follows from Proposition 12.11 that g X. Hence, from the maximality of E{g), 
we get E{g) — E{g) > 0. A simple computation gives E{g) = E{g) — Ei{g) + Ei{h), and therefore 
Ei{g) — Ei{h) — E{g) — E{(j) > 0. It follows that g maximizes Ei on Xi. 

Conversely, suppose that g is a maximizer of Ei on Xi for all i = 0, n. Let h G X . We have 

n n 

i?(5)-^i?.(ff)>^£;.(/^) = £;(/i), 

1=0 1=0 

which means that 5 is a maximizer of E' on A". □ 

We now give a heuristic way of determining the Entropy Maximizer, but in the next subsection we also 
give a rigorous proof that this is indeed the correct result. Formally applying the Lagrange Multipliers 
Theorem, we conclude that the maximizer has the form 

g(x)=a,e'3'" on [K,,K,+^[. (5) 

To see this, define the functionals 

J-{g) := — / g{x)hig{x)dx 

JKi 

rK.+i 

S{g) ■■= / g{x)dx 

JKi 

rK,+i 

nig) / xg{x)dx 

J Ki 

and solve the equation 

5T{g) + XidGig) + \26n{g) - 
for the Frechet derivatives. It follows that 

Ki+i i-K,+i 

{\ng{x) + l)6g{x)dx = / (Ai + X2x)6g{x)dx. 

Ki J Ki 

Therefore, on the interval [Ki, Kij^i[^ we must have 

ln,g(a;) + 1 = Ai + A2X, 
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and, introducing ai e^^ ^ and Pi := A2, we obtain ([5]). 
Using the explicit form of g just found in ^ and ^ gives 



e'^-^dx = D,-D,+i, (6) 

Ki 
rK^+l 

a, / xe'^'^dx = {C\ + K,D,) - {C^+i + K,+iD,+i) (7) 
for all i = 0, n. For i < n, solving ^ for a,; and then ([7]) for /3i, using integration by parts, gives 



~ f^"- (.piKi+i _ ^l3iKi' (^) 



Define 

e(/3;i^„X,+i) := ' ^^j^,^^ _ ^pK, ^• 

It follows that 



/32 --'^ (e/3if.+i _e/5^.)2■ 



Figure O shows the graphs of Q{(i]Ki,K,+i) and Q'{(5]Ki,Ki+i) for A'i = 10 and Ki+x = 30. It 
suggests that equation © has a unique solution if the quantity on the right hand side is in ]Ki, 
This turns out to be the case, as we show with the following proposition. 

Proposition 2.3 Let i € {0, If there is no arbitrage opportunity implied by Di, Dij^i^Ci^Ci+i, 

then there is a unique solution (ai^Pi) for equations ([6]) and ([7]). 

Proof Define 

(a + K,Di) - (a+i + if.+i A+i) 



K := 



We first show that we must have Ki < K < ifi+i. This can be seen by comparing a derivative that pays 
^ if < S{T) < i^i+i, i.e. K{Di — i^i+i), to the raised call spread 

V := (C, + K,D,) - (a+i + A+i) 

that pays S{T) if Ki < S{T) < Ki+i. liV < Ki{Di - Di+i) or V > Ki+i{Di - Di+i), then there is 
clearly an arbitrage opportunity. 

Next we show that ii Ki < K < Ki^i, then there is a unique solution {ai,Pi) for equations © and 
We begin with the case i < n. As we have just seen, (|6|) and ^ are then equivalent to ((8]) and ([9]). 
Without loss of generality, we may assume Ki = and A'^+i = 1. Indeed, it is straightforward to see 
that the change of variables 

(/3, K) ^ [x, A), X PiK,+i - K,), X ^ ' 



K,+i - K, 
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Figure 1: Graphs of 8(/3; 10,30) and Q'{/3; 10,30) (here ' means derivative with respect to (3). 



transforms the equation 8(/3; Ki, -K^i+i) = K into Q{x; 0, 1) = A, with A e]0, 1[. Using I'Hospital's rule 
we obtain that the function : M ^ R given by 



Fix) 



1 



— 1 X 



if x 7^ 0, 



if a; = 



is a continuous extension of Q{.; 0, 1). It is easy to see that lim F{x) = and lim F{x) = 1. Hence 

X — ^ — OO X — * + oo 

the equation F{x) = A has a solution. To prove that the solution is unique, we shall now show that 
F is strictly increasing. Again by PHospital's rule, we obtain that F is differentiable at a; = and 
F'{0) — 1/12 (this is particularly useful because a; = is an ideal starting point for the Newton- Raphson 
method). For x ^ we have 

1 



x^ (e^ - 1)2 



Recall that x ^ sinhx > 1 for all .t e M. Hence 

pX/2 _ p-x/2 — 1 



> 1 ^ 



> e 



x/2 



1 



> F'{x) > 0. 



X X \ ^ 

We conclude that F'{x) > for all x € R. Therefore F is strictly increasing. Finally, we consider the 
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case i = n. Equations © and ([7]) then become 



oo 

an I xe^^'^dx = Cn + KnDn- 



The first equation implies that /?„ < 0. Solving it for a„ and the second equation for /3„ gives 



□ 



Note that we have shown that F is itself a continuously differentiable probability distribution function. 
For such a function there might already exist an inversion-algorithm. 



2.2 A Rigorous Way of Finding the Entropy Maximizer 

Like others, we have formally derived the expression for the Entropy Maximizer using the Lagrange 
Multipliers method. However, as pointed out in [21 "there is a problem with this type of calculation" . 
Recall that the Lagrange Multipliers Theorem requires continuous differentiability for objective and 
constraint functionals in a neighbourhood of the maximizer. However, the Boltzmann-Shannon Entropy 
functional is finite only for densities in 

O:={.9eLi(0,oo) |glngeLi(0,cx3)}, 

which has empty interior on L^(0, oo). Therefore a maximizer is not an interior point of O . Even worse, 
the Entropy is far from being continuously differentiable since it is nowhere continuous. 

In [2_, convex programming arguments are considered to circumvent this problem. Here we present 
a new approach based on a result by Csiszar . 

When no prior density is given we are interested in the (non relative) Entropy of g 

Ei{g) ■■= - J g{x)\ng{x)dx, 

where / C [0,oo[ is an interval. However, Csiszar's results deal with Relative Entropy of a probability 
density g with respect to a probability measure R on / 

Ei{g\R) -.^^ j^g{x)\Yig{x)dR{x). 

Roughly speaking, we are interested in the "Relative Entropy" with respect to the Lebesgue measure 
which, in general, is not a probability measure. 

For i = 0, . . . , n — 1, / = [Ki, Ki^i[ is bounded. In that case, the problem can easily fit in Csiszar's 
framework by considering the normalized Lebesgue probability measure dR{x) — (-R'i+i — Ki)~^dx. 
However, it is impossible to use this trick for the global problem, I = [0,oo[, and for the last bucket, 
/ = [Km oo[, since there is no normalization constant which turns the Lebesgue measure into a probability 
measure on these intervals. 
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Nevertheless, it is possible to turn the two problems over unbounded intervals into equivalent ones 

that do fit in Csiszar's framework. This is the subject of the next proposition. Moreover, the same 
arguments also apply to bounded intervals. Therefore, we do not need to make any distinction between 
bounded and unbounded intervals. 

For the sake of simplicity, the statement in the following proposition considers only two main con- 
straints, namely, the total mass and the mean. This includes the bucket problems and excludes the 
global problem (where additional constraints are given). However, the proof works even for an infinite 
number of constraints provided the two main ones are among them. 

Proposition 2.4 Let I C [0, oo[ be an interval. Define m{x) = 9e~^ for all x £ I, where 9 > is a 
normalization constant such that dR{x) = m{x)dx is a probability measure on I. Let ao,ai > 0. Then 
the mapping g i— > g/m is a bijection from 



Q := i^g G A4~^ J g{x)dx = ao, J xg{x)dx = ai^ 



onto 



f2 := 1^ e J g{x)dR{x) = ao, J xg{x)dR{x) = ai| . 

Moreover, g is a maximizer of Ej on fl if and only if g/m is a maximizer of Ei{-\R) on O. 

Proof Define \1/ : M.^ A4~^ by "^{g) := g/m. Since m is strictly positive, it follows immediately 
that ^' is a well defined bijection. 

We shall show that ^ preserves some linear functional. Let g G and / : / — > M. Then we have 

jj{x){^{g){x))dR{x) = jj{x)^^dR{x) = jj{x)g{x)dx. 

In particular, applying this result to f{x) = 1 and to f{x) = x, it follows immediately that ^ maps O 
onto f2. 

To complete the proof it suffices to show that ii g,h G O, then 

Ei{g) - Ei{h) > ^ Ei{<l>{g)\R) - Ei{^{h)\R) > 0. 

In fact, we shall show a stronger result, namely, that the two differences above are equal. This is 
equivalent to showing that Ej{^{g)\R) — Ej{g) does not depend on g Gfl. We have 



Ei{^{g)\R) = - j^{^{g)ix))\n{^ig){x))dR{x) 
Jjm{x) \m{x)J 



g{x)\n{^]dx 
' m{x) / 



-I. 

— J g{x)\ng{x)dx + J g{x)lnm{x)dx 
Ei{g) + \n9 J g{x)dx — J xg{x)dx 



= Ei{g) + aoln9 - ai. 
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□ 



Later we will restate and apply a partial version of a theorem by Cziszar. Before doing so, let us say 
a few words about it. 

It is very natural to apply the Lagrange Multipliers Theorem for maximization problems under 
constraints. However, there are many cases where other techniques are used - for instance in the proof 
of the existence of projection on a convex set of a Hilbert space. In that case, geometric arguments, 
including the Parallelogram Identity, are used. 

Many texts suggest thinking of the Relative Entropy of one probability measure with respect to 
another as as quantity measuring how much they differ. Moreover, they present some similarities between 
relative Entropy and a metric. Unfortunately, they say, this analogy does not go too far. Csiszar's 
paper pushes these similarities a bit further, showing a relation analogous to the Parallelogram Identity. 
Furthermore, he proves the existence of an Entropy Minimizei0 under convex constraints by similar 
arguments that show the existence of projection on convex subsets of Hilbert spaces. 

We restate here a partial version of his Theorem 3.1 sufficient for our purposes. 

Theorem 2.5 (Csiszar) Let R be a probability on a measurable space {X,Ti.). Let {/^}^gp be an 
arbitrary set of real-valued Ti. -measurable functions on X and {a-y}-y^r be real constants. Let £ be the set 
of all those probabilities P on {X,Ti.) for which the integrals J f-ydP exist and equal a-y (7 G F). Then, 
if there exists Q (z £ such that Q <^ R and its Radon-Nikodym derivative has the form 

^(a;)=ce'(-) € /, (10) 
where c > and q belongs to the linear space spanned by the f-y 's, then 

for all P e £ such that P <€. R. 

Now we prove that g, given by ([S]), ([8]) and ([9|), is indeed an Entropy Maximizer. 

Theorem 2.6 Let i 0, . . . ,n, I — [Ki, Ki-f-i[. Let ai and [3i be defined by equations ([5]) and Then 
g : I M. given by 

g{x) = aie'^''' Va; G / 

maximizes Ei on Xi. 

Proof Set flo = -Di+i — Di and a\ = {Ct + KiDi) — {Ci+i + Ki+iDi+i). Let to, i?, Q. and Vl be as in 
Proposition Note that for this choice of J, ao and ai, we have VI = Xi and Ej = Ei. 

Let X — I,Ti.he the cr-algebra of Lebesgue measurable subsets of /, F = {0, 1}, f-yix) = oqx'' (7 G F) 
and £ as in Csiszar's theorem. 

Given /i G f2, define the measure by dPj^{x) — aQ^h{x)dR{x). From the definition of fl, it follows 
that Pj^ G £. Conversely, if P G £ and P ^ R, then ao • dP/dR G fl. Then a simple computation yields 

" -ao-iy^/i(a;)ln/i(x)di?(:r)+ln(ao) 

= a^^EiCh\R)+\n{ao). (11) 

^In Csiszar's paper, the minus sign in front of Entropy's definition is dropped and its minimization (rather than 
maximization) is studied. 
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By definition of Ui and (3i we have g ^ Xi = Vl. Proposition 12.41 yields g — g/m E Cl. Moreover, 
g{x) = a,6'-ie(^'+i)^ for all xE I. 

Let Q = Pg. It follows that Q € £, and its Radon-Nikodym derivative with respect to R (which is 
ciQ^g) has the form (fTU|) with c = aQ^ai9~^ and q = {(3i + l)a^^/i. Therefore, Csiszar's theorem gives 

/ 1|(-> '» (l|(->) ''«<^' ^ / i<-> '° (i<^') 

for all P e f such that P ^ R. In particular, for all h E il, from (fTTj) we obtain 

We conclude that ^ maximizes Ei{-\R) on and, again by Proposition 12.41 that g is a maximizer of 
Ei on Xi. □ 



2.3 Some Results Regarding the Entropy Maximizer 

We have the explicit form of the density given by equation ([5]) . This allows us to give formulas in several 
important cases. To do this, we first state two useful results for the following proofs. 



For K E [Ki, Ki+i[, we have 

rK 

g{x)dx 



Ki 
K 



K 



xg{x)dx 



a,, I e'^'^'dx = — {e'- 
'k. Pi 

ai [ xe^'^^dx = 

Jk, "Pi 



a. 



■{Ke 



PiK 



K,,e 



PiKi 



K 
Ki 

ai 



LtpP^K _ „P,Ki s^ 



(12) 



(13) 



It is straightforward to integrate the density g and obtain an explicit form of the probability distri- 
bution 

G{x) := rg{s)ds. 
Jo 

Its inverse can also be expressed analytically, which is a useful feature for Monte Carlo simulations. The 
results are stated in the following proposition. 

Proposition 2.7 Suppose K E [Ki,Ki+i[. Then 

l-D^ + ^ie^^K-e^-^^) if A^O, 

Pi 



l-b, + a^{K-Ki 



if A = 0. 



G{K) 

Given L €]0, 1[, find i E {0, . . . , n} such that 1 — L E jD^+i, D,] . Then 
G-\L)^{ 



^l^(eP^K. + lk^D,^l + L)] ifA^O, 

P'i. V 



D,-l + L 



if P^ = 0. 
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Proof We treat only the case Pi ^ 0. The simpler case Pi — is left to the reader. 
First, notice that G{Ki) = 1 — Then, using (O, we get 

/•Ki pK 

G{K)= g{x)dx+ g{x)dx = l-D, + -^{e/^^^ -e^^^^). (14) 

Jq J Ki Pi 

Since L e [1 - A,l - = [G(i^,), G(i^,+i)[, we have G-^{L) e [K,,K,+i[. Therefore solving 

dH]) for K = G^^{L) concludes the proof. □ 

It is also straightforward to express the prices of call and digital options analytically. 

Proposition 2.8 Given a strike K £ [Q,oo[, find i E {0, . . . ,n} such that K G [Ki, Ki-^^l[. If Pi ^ 0, 
then 

D{K) = A-^(e'^^^-e'3-^-), 

Pi 

C{K) = C,-(if-X,)(^A + |eft^'')+|(eft^-eft^')- 

// P^ = 0, then 

b{K) - b,~a^{K-K,), 

C{K) = C,-{K-K,)D, + ^{K-Kif. 

Proof Again we prove only the case f3i ^ 0. From (fT^ we obtain 

poo poo pK 

b{K)= g{x)dx= g{x)dx- g{x)dx = D, - -^(e^^^ - e^^^^). (15) 

Jk JKi JKi Pi 

For the (undiscounted) call price we have 

/"Oo />oo j^K 

C{K) + KD{K) = / xg{x)dx = / xg{x)dx - I xg{x)dx 

JK JKi JKi 

= C, + K,b,- [ xg{x)dx. (16) 

J K, 

Now putting (fO|) and (fT5|) into (fTO]) leads to the stated result. □ 

Finally, using Euler's relationship for homogeneous functions, we can also give an explicit formula for 
spot-delta. 

Corollary 2.9 Given a strike K e [0, oo[, find i € {0, . . . , n} such that K € [Ki, Ki^i[. Let S be today's 
underlying spot price and A be the spot-delta of a call with strike K maturing at T. If Pi ^ 0, then 

A = ^^^M (^c. + K,D, - j^iKe^''' - i^.e*^') + ^(e^^ - e'^^^o) ■ 
// p^ = 0, then 
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Proof Again we consider only the case Pi ^ and leave the simpler case /3i = for the reader. 

Let C and D be, respectively, the discounted prices of call and digital options with strike K maturing 
at T, i.e. C = DF{0,T)C{K) and D = DF{0,T)D{K), where C and D are as in Proposition[ 



Since C is a positively homogeneous function of degree 1 in {K, S), from Euler's Theorem we have 

^dC dC 

Recalling that D = — ^ , we can rewrite the last relation as 

SA = C + KD ^ DF{0, T){C{K) + KD{K)). 

Now using and gives the result. □ 

Note that the analogous statement for the forward-delta can be obtained by replacing the spot-price 
S with the forward-price F in the corollary and proof above. 

2.4 Maximum Entropy Distribution With a Given Prior Distribution 

If we hold a prior belief about the distribution, we can maximize Relative Entropy instead in order to 
stay as close as possible to the prior distribution. Suppose p{x) is a probability density for this prior 
distribution. For Ph ^ Pp, define Relative Entropy 

E{h\p) := - r h{x) In ( dx. 

Jo \P[x)J 

(Usually the KuUback-Leibler Divergence is given by EKL{h\p) = —E{h\p). This can be thought of as 
a measure of distance between two distributions. For example, EKL{h\p) > V/i, and EKL{h\p) = if 
and only li h = p.) We have 







.P{x) ) Jo Pix) \pix) 

and essentially the same argument as the one given above shows that 



p{x) 



7,e^-" xe[K,,K,+i[. (17) 



Therefore the resulting density h = gp is now given by the product of a piecewise exponential density 
and the prior density. 

Even in the simple case where the prior density p is just log-normal, we no longer have explicit 
formulas for call and digital prices. Since we cannot separate the two constraints 

i-K. 



/ h{x)dx = A-A+i, 

JKi 



xh{x)dx = {C, + KiDi)- {Ci+i+ K,+iiD,+i) 

Ki 

for each i — 0,...,n, as in equations ^ and ([71), we must solve them simultaneously using numerical 
integration and a two-dimensional root-finder. 
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However, if the prior density p is already given by an MED, then 

h{x) = g{x)p{x) = a,;7,e('^'+'^')^ x € [K,,K,+i[, 

and we can solve everything analytically as before. We also recover explicit formulas for call and digital 
prices. 

3 The Maximum Entropy Distribution Using Calls 

3.1 Maximum Entropy Distribution 

Buchen and Kelly [4J propose a similar method to find an Entropy-maximizing density gsK under 
constraints given by European payoffs. The case of most interest is where these are the payoff-functions 
of call options at different strikes Ki, Km and the actual constraints are given by (undiscounted) call 
option prices Ci, Cm such that 

must hold for all i = 1, m. 

The density gsK must therefore satisfy the conditions 

(x — Ki)^ gBK{x)dx — Ci yi — l,...,m (18) 

"'0 

and 

/•oo 

gBK{x)dx = 1. (19) 







To find gsK^ they construct the functional 

/•oo 

Ti-igBK) ■= 9bk(.x) lngBK{x)dx 

Jo 

r-OO 

-I- (1 -I- Ao) / gBK(x)dx 



K / {x- Ki)+gBKix)dx, 



where Aq, Am are the Lagrange multipliers, and then solve the equation 

Sn^J i-lngBK{x) + Xo + Y,Xi(x-K,)+\dgBK{x)dx^O. 
The solution is given by 

gBxix) = M^-K,)+ g [0, oo[, (20) 

A* 

where /i e^'^" ~ e^i=i dx is a normalizing constant. 

Buchen and Kelly show that numerically, finding the parameters Ai, A,„ is an m-dimcnsional root- 
finding problem that can be tackled with the multi-dimensional Newton algorithm. They show how to 
compute the Jacobian, and that it is invertible, by expressing it as a covariance matrix. 
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If a call option with strike Ki = 0, i.e. the forward, is among the input data, the mean of the 
distribution is given. Since the total mass, 1, is also known, we have the two main constraints needed to 
apply the proof from subsection 12.21 and can therefore also rigorously find the Entropy Maximizer when 
only call options are given as input. Of course, the forward should be known in most situations, so that 
this is certainly the most important case. 

3.2 Maximum Entropy Distribution With a Given Prior Distribution 

Similarly, if a prior distribution p is given, the distribution maximizing relative Entropy under the same 
constraints is given by 

e^"i'*("-^')^ Vxe[0,oo[, (21) 
where fi — p(x)e^»=i dx is again the normalizing constant. 

4 Comparing the Distributions 

In this section we give some numerical examples for the Entropy Maximizers described so far. We suppose 
that the market data is given by 

F = 100, r = 0, cr = 0.25, T = 1. 

We assume a flat volatility and make no skew correction when calculating the digital prices in this 
scenario. 



4.1 MED from Calls and Digitals 

We calculate three densities using strikes 



• Ko 


= 0, Ki 


= 100 






• Ko 


= 0,Ki 


= 60, K2 


= 100, Ks - 


= 140 


• Ko 


= 0,Xi 


= 60, K2 


= 80, Ks = 


- 100, K4 ^ 120, K5 = 140 



Table 14.11 gives the (undiscounted) option prices we used and the parameters describing the density. 



Figure ini shows the three densities and the actual log- normal density. It can be seen that already with 
5 strikes and the forward, the fit of the piecewise-exponential distribution to the log-normal distribution 
is very good. 
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Table 1: Option Prices and Density Parameters 



Market: 



Strike 


0.00 


20.00 


40.00 


60.00 


80.00 


100.00 


120.00 


140.00 


160.00 


180.00 


Call 


100.0000 


80.0000 


60.0005 


40.1454 


22.2656 


9.9477 


3.7059 


1.2139 


0.3659 


0.1049 


Digital 


1.0000 


1.0000 


0.9998 


0.9725 


0.7786 


0.4503 


0.1965 


0.0707 


0.0225 


0.0066 


iViJzj 1 otl IKC. 






















Entropy 


4.6714 




















a 


1.3582E-04 


n/a 


n/a 


n/a 


n/a 


1.8835 


n/a 


n/a 


n/a 


n/a 




0.0539 


n/a 


n/a 


n/a 


n/a 


-0.0453 


n/a 


n/a 


n/a 


n/a 


Call 


100.0000 


80.0402 


60.2562 


40.9886 


23.2384 


9.9477 


4.0232 


1.6271 


0.6581 


0.2661 


Impl.Vol. 


n/a 


0.6213 


0.4626 


0.3617 


0.2888 


0.2500 


0.2595 


0.2704 


0.2784 


0.2841 


Digital 


1.0000 


0.9951 


0.9808 


0.9386 


0.8146 


0.4503 


0.1821 


0.0736 


0.0298 


0.0120 


iviijj o otriKCS, 






















Entropy 


4.6143 




















a 


6.0682E-08 


n/a 


n/a 


0.0016 


n/a 


0.5397 


n/a 


14.2333 


n/a 


n/a 


/3 


0.1894 


n/a 


n/a 


0.0255 


n/a 


-0.0343 


n/a 


-0.0582 


n/a 


n/a 


Call 


100.0000 


80.0001 


60.0033 


40.1454 


22.4905 


9.9477 


3.7539 


1.2139 


0.3790 


0.1183 


Impl.Vol. 


n/a 


0.3876 


0.2860 


0.2500 


0.2593 


0.2500 


0.2514 


0.2500 


0.2515 


0.2538 


Digital 


1.0000 


1.0000 


0.9994 


0.9725 


0.7765 


0.4503 


0.1978 


0.0707 


0.0221 


0.0069 


ME 5 Strikes: 






















Entropy 


4.6076 




















a 


6.0682E-08 


n/a 


n/a 


1.5393E-04 


0.0129 


0.2389 


1.6987 


14.2333 


n/a 


n/a 





0.1894 


n/a 


n/a 


0.0584 


0.0027 


-0.0268 


-0.0433 


-0.0582 


n/a 


n/a 


Call 


100.0000 


80.0001 


60.0033 


40.1454 


22.2656 


9.9477 


3.7059 


1.2139 


0.3790 


0.1183 


Impl.Vol. 


n/a 


0.3876 


0.2860 


0.2500 


0.2500 


0.2500 


0.2500 


0.2500 


0.2515 


0.2538 


Digital 


1.0000 


1.0000 


0.9994 


0.9725 


0.7765 


0.4503 


0.1978 


0.0707 


0.0221 


0.0069 



We show in Figure [ITTl the implied volatility surface obtained by using just one strike Ki = 100 and 
the forward, and a constant at-the-money volatility of ti = 0.25 at each maturity. (Both the forward 
and the volatility could of course be time dependent.) The surface is calculated by pricing call options 
with the MED and then finding the implied volatility with a bisection root-finder from the Black-Scholes 
formula. Readers interested in a more robust method can consult the one proposed in |10j . 

The Maximum Entropy method seems to be able to transform just one volatility number from a flat 
Black-Scholes world into a very realistic looking volatility surface, with important features such as a 
strongly pronounced smile at the short end that decays as the maturity increases. 

Different strikes and different numbers of option prices can of course be used at different maturities, 
so that any arbitrage-free option data can easily be converted into an implied volatility surface. 

The density g is usually discontinuous at the i^i's. The distribution function is of course continuous. 
Many Monte Carlo models work by drawing a random uniform variable and inverting the distribution. 
In Black-Scholes type models, for example, a normal distribution has to be inverted at some stage. In 
our case, only one logarithm needs to be taken, which accelerates a simulation. 



4.2 MED from Calls and Digitals with Prior Log-Normal Distribution 

Let the prior distribution be a log-normal distribution with fixed volatility parameter a 

p(x) — p 2<T^T Vx € [0,cxd[. 

We still have an explicit form of the density, namely h = gp, where g is a piecewise exponential density, 
although the parameters ji , 5i are of course different from the parameters ai , /3i used for the MED of the 
previous subsection. Since we are now unable to express call prices analytically, we calculate their prices 
via numerical integration. 
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0.03 



0.02 



0.02 



0.01 



0.01 



0.00 



•Density 1 Strike 
•Density 3 Strikes 
Density 5 Strikes 
•Log-normal Density 



T 1 1 1 1 1 1 r 

5 25 45 65 85 105 125 145 165 185 205 225 



Figure 2: Three ME densities and the actual log-normal density. 



Table 14721 gives the parameters describing the density. Of course, should a prior density already meet 
the constraints, we will have ji = 1 and 6i = for all i. 



Figure 14.2! shows the three Maximum Relative Entropy densities and the prior log-normal density 
{F = 100,(7 = 0.20). The density with one strike and the forward is already much closer to the actual 
one than in the previous case, so that convergence isn't as pronounced as before when the number of 
strikes is increased. The resulting volatility smile is much flatter. We see that g has the effect of pushing 
the prior density downwards and widening it as to be closer to the actual density. 



4.3 MED from Calls 

The explicit form of the density given by equation ()20p allows one to obtain analytic expressions for 
call and digital prices like those in Proposition 12.81 As an example, using just the forward and an 
at-the-money call, i.e. Ki = 0, K2 = 100, we obtained 

Ai = 0.048747, A2 = -0.098626, = 5290.62 

on our computer. This leads to a very similar volatility smile as the one given at T = 1 in subsection 
14.1! above. We refer to F5] for graphs and numerical data regarding this distribution. 
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Figure 3: Implied volatility surface obtained by calibrating only to the (constant) at-the-money volatility 
curve. 

4.4 MED from Calls with Prior Log-Normal Distribution 

As in subsection 1321 in general there will be no analytic expressions for call or digital prices. If the chosen 
prior distribution is continuous, then the resulting Relative Entropy Maximizer will also be continuous. 
Again, we advise the reader to look at |3j for graphs and numerical data regarding this distribution. 



5 Some Remarks on Other Implied Distributions 

In most situations the information observed in the market regarding an asset consists of option prices 
at a discrete set of strikes. Using this to extrapolate the second derivative of a function everywhere, as 
suggested by Breeden and Litzenberger [3] or the Local Volatility approach, relies on additional assump- 
tions about the distribution of returns, the SDE the asset follows and/or the choice of an interpolation 
method. Even when there are strong reasons for such assumptions, we believe that it is important to 
know the shape of the distribution function given by the Principle of Maximum Entropy (PME) in case 
these assumptions turn out to be flawed. 

We are aware of the dearth of quoted digital prices in the market. And we agree that in practice 
these prices often come from an artificial source like smile interpolation. However, we believe that our 
approach still has advantages over Derman and Kani's [8] and Dupire's [9]. 
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Table 2: Maximum Relative Entropy Density Parameters 



strike 


U.UL) 


OU.OO 


OA AA 

oU.OO 


1 AA AA 
lUU.OO 


1 OA AA 

IzU.OO 


1 /I A A A 

140.00 


Relative ME 1 Strike: 














7 


12.2600 


n/a 


n/a 


0.0833 


n/a 


n/a 


r 



n none 


n/a 


n/ a 


U.UZUD 


n/a 


n/a 


Relative ME 3 Strikes: 














7 


11.2900 


7.2379 


n/a 


0.2930 


n/a 


0.3267 


5 


-0.0194 


-0.0237 


n/a 


0.0098 


n/a 


0.0116 


Relative ME 5 Strikes: 














7 


11.2900 


5.9910 


2.0430 


0.5970 


0.5815 


0.3267 


S 


-0.0194 


-0.0210 


-0.0097 


0.0031 


0.0047 


0.0116 



In the Local Volatility Model call prices for all strikes in [0, (X)[ are needed, and, additionally, it 
assumes that the smile volatility is twice continuously differentiable. Hence this approach requires an 
infinity of non-quoted prices together with a strong regularity. "Since the market provides call prices 
at only a small number of strike prices, the second derivative must be estimated by interpolation. This 
method is not very robust as the results are very sensitive to the interpolation scheme used." [4J 

In our case, the artificial data required consists only of digital prices for a finite, usually small, set 
of strikes K. = {Ki, Kjsi}. If one assumes (and our approach does not) that the volatility smile is 
differentiable with respect to the strike at the points of /C, then prescribing digital prices there is indeed 
equivalent to prescribing the value of the smile derivative. This is still a much weaker requirement than 
that of the Local Volatility model. 

6 Conclusion 

Entropy has been one of the main concepts in Information Theory, and since market participants react to 
information when taking their positions, we believe Entropy is a very natural tool to be used in Finance. 

In the article's introduction we provide a brief explanation of the meaning of Entropy. Essentially, 
it is a measure of how unbiased a probability distribution is. Hence, by maximizing Entropy, what we 
propose is to find the most unbiased probability distribution which agrees with information provided 
from the market. We then show how this hypothesis leads to a piecewise exponential density. 

The method we propose can be used reliably and efficiently in practice. On the one hand, we have 
seen that it produces a remarkably realistic volatility surface from just one volatility number as in the 
original Black-Scholes model, with a steep skew for short maturities that decays over time. On the other 
hand, if the actual distribution is known, then with option prices given at five or more strikes, the fit to 
it is very close. In particular, it can be used as a robust interpolation method for volatility curves. 

If additionally there is knowledge of a prior distribution, the Principle of Maximum Relative Entropy 
can be applied to find a density that takes this into account and also meets the new constraints. We 
give an example of such a scenario with two log-normal distributions, and show that the convergence to 
the actual distribution is particularly quick. 

Buchen and Kelly have proposed a similar method of finding a probability density that maximizes 
Entropy when the market data consists only of call options. The density they obtain is continuous. 
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5 25 45 65 85 105 125 145 165 185 205 225 



Figure 4: Graphs of the actual log-normal density with a — 25%, prior log-normal density with a = 20% 
and three Relative Entropy Maximizers obtained by calibrating to 1, 3 and 5 strikes. 

However, to find its parameters they must solve a multi-dimensional root-finding problem with the 
Newton-Raphson algorithm. 

One criticism often raised in this application of the PME is that the method of finding the form of 
the density uses Lagrange multipliers and is not rigorous. Indeed, this technique works in well practice 
and leads to the correct form, but we fill the gap by giving a complete mathematical proof that avoids 
them. Relative Entropy has often been compared to a metric for probability distributions. Our proof 
uses results by Csiszar that give additional insights into "distances" between distributions and establish 
remarkable "geometric" results. 

Since we have an explicit form of the density, we are able to give analytical formulas for the distri- 
bution, inverse distribution, and call and digital option prices. Using Euler's relation for homogeneous 
functions, we give formulas for spot- and forward-deltas. 
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